Skip to content

Conversation

susbhere
Copy link
Contributor

Tensor layout related properties are calculated once and used those
cached values during per element offset calculation. This brings ~200x improvement in wait time between two queries for PhiSlica model. That means a user has to wait only for 0.36 sec (instead of 74 sec !!!) between two queries. These numbers are from LNL.

JIRA: https://jira.devtools.intel.com/browse/CVS-174810

@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Oct 17, 2025
@susbhere susbhere force-pushed the optimize_padded_copy branch 2 times, most recently from 820f1d4 to 31ed86d Compare October 17, 2025 11:40
@susbhere susbhere marked this pull request as ready for review October 17, 2025 11:43
@susbhere susbhere requested review from a team as code owners October 17, 2025 11:43
@susbhere
Copy link
Contributor Author

build_jenkins

@susbhere susbhere force-pushed the optimize_padded_copy branch 6 times, most recently from 98db555 to f3da61f Compare October 17, 2025 15:25
@susbhere susbhere force-pushed the optimize_padded_copy branch 6 times, most recently from 58fd670 to 8ec08c0 Compare October 21, 2025 10:21
Tensor layout related properties are calculated once and used those
cached values during per element offset calculation. This brings ~200x improvement in wait time between two queries for PhiSlica model. That means a user has to wait only for 0.36 sec (instead of 74 sec !!!) between two queries. These numbers are from LNL.

JIRA: https://jira.devtools.intel.com/browse/CVS-174810
@susbhere susbhere force-pushed the optimize_padded_copy branch from 8ec08c0 to f056da3 Compare October 21, 2025 15:17
@susbhere
Copy link
Contributor Author

build_jenkins

fmt == bfvuwzyx);
}

static void get_axes_map(const format& fmt, int64_t* axes_map, size_t& map_size) {
Copy link
Contributor

@yeonbok yeonbok Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static std::vector<int64_t> get_internal_dims(const format& fmt) const{
to use more aligned naming and use safe container usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to use aray to make it the fastest. Even a small perf difference between vector and array adds up to some noticable difference. If I use vector here, I have to copy it to a local array in the caller.

for (int64_t y = 0; y < size.spatial[1]; y++) {
for (int64_t x = 0; x < size.spatial[0]; x++) {
*dst++ = static_cast<dst_t>(src[layout.get_linear_offset(cldnn::tensor(b, f, x, y, z, w))]);
void convert_and_copy_padded_source(const src_t* src, dst_t* dst, layout& layout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works for plain format. So please apply it for plain format only. For other (e.g., blocked) formats, please use original method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants